AITopics | observation prompt

Collaborating Authors

observation prompt

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

3191170938b6102e5c203b036b7c16dd-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 19:40:48 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Information Technology (0.68)
Leisure & Entertainment > Games > Computer Games (0.46)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations

Neural Information Processing SystemsOct-9-2025, 22:42:38 GMT

As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial.

agent, llm, opponent, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Industry:

Government (0.93)
Information Technology (0.68)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations

Duan, Jinhao, Zhang, Renming, Diffenderfer, James, Kailkhura, Bhavya, Sun, Lichao, Stengel-Eskin, Elias, Bansal, Mohit, Chen, Tianlong, Xu, Kaidi

arXiv.org Artificial IntelligenceJun-10-2024

As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial. This paper evaluates LLMs' reasoning abilities in competitive environments through game-theoretic tasks, e.g., board and card games that require pure logic and strategic reasoning to compete with opponents. We first propose GTBench, a language-driven environment composing 10 widely recognized tasks, across a comprehensive game taxonomy: complete versus incomplete information, dynamic versus static, and probabilistic versus deterministic scenarios. Then, we (1) Characterize the game-theoretic reasoning of LLMs; and (2) Perform LLM-vs.-LLM competitions as reasoning evaluation. We observe that (1) LLMs have distinct behaviors regarding various gaming scenarios; for example, LLMs fail in complete and deterministic games yet they are competitive in probabilistic gaming scenarios; (2) Most open-source LLMs, e.g., CodeLlama-34b-Instruct and Llama-2-70b-chat, are less competitive than commercial LLMs, e.g., GPT-4, in complex games, yet the recently released Llama-3-70b-Instruct makes up for this shortcoming. In addition, code-pretraining greatly benefits strategic reasoning, while advanced reasoning methods such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT) do not always help. We further characterize the game-theoretic properties of LLMs, such as equilibrium and Pareto Efficiency in repeated games. Detailed error profiles are provided for a better understanding of LLMs' behavior. We hope our research provides standardized protocols and serves as a foundation to spur further explorations in the strategic reasoning of LLMs.

agent, llm, opponent, (14 more...)

arXiv.org Artificial Intelligence

2402.12348

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report > Experimental Study (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments

Chen, Junzhe, Hu, Xuming, Liu, Shuodi, Huang, Shiyu, Tu, Wei-Wei, He, Zhaofeng, Wen, Lijie

arXiv.org Artificial IntelligenceFeb-26-2024

Recent advancements in large language models (LLMs) have revealed their potential for achieving autonomous agents possessing human-level intelligence. However, existing benchmarks for evaluating LLM Agents either use static datasets, potentially leading to data leakage or focus only on single-agent scenarios, overlooking the complexities of multi-agent interactions. There is a lack of a benchmark that evaluates the diverse capabilities of LLM agents in multi-agent, dynamic environments. To this end, we introduce LLMArena, a novel and easily extensible framework for evaluating the diverse capabilities of LLM in multi-agent dynamic environments. LLMArena encompasses seven distinct gaming environments, employing Trueskill scoring to assess crucial abilities in LLM agents, including spatial reasoning, strategic planning, numerical reasoning, risk assessment, communication, opponent modeling, and team collaboration. We conduct an extensive experiment and human evaluation among different sizes and types of LLMs, showing that LLMs still have a significant journey ahead in their development towards becoming fully autonomous agents, especially in opponent modeling and team collaboration. We hope LLMArena could guide future research towards enhancing these capabilities in LLMs, ultimately leading to more sophisticated and practical applications in dynamic, multi-agent settings. The code and data will be available.

agent, llm, observation prompt, (15 more...)

arXiv.org Artificial Intelligence

2402.16499

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Texas (0.05)
Europe > Middle East (0.04)
(9 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback